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show that the two versions of pHd can behave completely differently in the 
presence of certain observational types. Our results also provide evidence 
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cal data, we consider the influence function in the empirical setting for the 
efficient detection of influential observations in practice. 
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1. Introduction 

Dimension reduction methods have increased in popularity in recent times 
due to an abundance of high-dimensional data. The increased acceptance of 
such methods gives rise to the need for further understanding with regards 
to the sensitivity of the associated estimators. For some dimension reduction 
methods, a consequence of this is the lack of diagnostics that can be used to 
detect influential observations. The purpose of this paper is to compare the 
sensitivity of two related, yet competing, dimension reduction methods and 
provide an influence diagnostic that is useful in practice. 

Consider a univariate response variable Y and p-dimensional predictor vector 
X. In the regression setting, when p is large it may be difficult to visually 
determine the complex structure relating Y and X due to our own inability 
to visualize data in more than a few dimensions. As such, dimension reduction 
methods that seek to reduce the dimension of X without loss of important 
regression information are highly valued. 

Here we examine the multiple-index model 

Y = f(B T X,e) (1) 

with B = [/3i, . . . , (3k] where (3k (k = 1,...,K) are unknown p-dimensional 
column vectors, e is the error term with sJLX (where _LL will denote indepen- 
dence throughout), E(e) — and / is the unknown link function. If we let 
r = [71, . . . ,Jk] denote an arbitrary basis for S — span(/3i, . . . ,(3k), then di- 
mension reduction without loss of information can be achieved by replacing X 
with r T X when K < p. Li [3] calls S the effective dimension reduction (e.d.r) 
space and we will follow the lead of Cook [5j in assuming that S is a central 
subspace in that it is defined at its minimum dimension. 

Many dimension reduction methods have been recently proposed that seek 
to identify S without prior knowledge of / and only mild distributional condi- 
tions for X. These include Sliced Inverse Regression (sir, |l3|), Sliced Average 



Variance Estimates (SAVE,[6j), SIRII [lj], Principal Hessian Directions (PHD,[15J) 



and Minimum Average Variance Estimation (mave,|22[) to name a few. 

Gather et al. 11 1 show that, at the sample level, SIR can fail in the presence 



of just one 'bad' observation; a finding sup por ted by way of the influence func- 
tion by Prendergast 3 III]. Prendergast [2(3] provided similar results via the 



influence function for SAVE and SIRII and showed that either of these methods or 
SIR may be the preferred choice, from a sensitivity standpoint, with respect to 
certain types of observations. Lue [ItJ introduced a trimming algorithm for one 
version of phd that iteratively trimmed observations and was shown to work 
well under simulations of some perturbed models. 
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Despite the fact that two different versions of phd were introduced by Li [15| , 
there has been little in the way of developing sensitivity comparisons between 
them. Cook Q notes that one of these versions may be preferable when the 
underlying model incorporates strong linear trends. The first purpose of this 
paper is to analyze and compare the sensitivity of these methods at the model. 
This allows for a deeper understanding into the detrimental effect that certain 
observational types may have in practice and allows us to explore the differences 
in the methods when dealing with such observations. As a consequence of such 
analyses, the second purpose of this paper is to introduce influence measures 
that can detect influential observations in practice. 



2. Principal Hessian directions 

Of the many recently proposed dimension reduction procedures, principal 
Hessian directions (phd) is perhaps the most intuitive extension of existing 
methodology. Though the method was developed by Li [15| using Stein's Lemma 



211 ] . PHD is strongly related to Ordinary Least Squares (ols) regression. Let 
X ~ N p (fi,'E) and suppose that the model given in {]]) holds with K = 1. 
It can be shown that (See [H, and [Hj]), under these conditions, where 
[i y = E(Y), H xy = E{(Y - Vy)(X - fi)} , and S _1 S xy denotes the OLS slope 
vector, 

E- 1 ^ e S. (2) 

Hence, in the single-index case where K = 1 for the model given in (fl]), OLS 
may be employed to derive a basis for S when the predictor variable is normally 
distributed. An exception to this is when X _1 Sa:j, in ^ is in which case, 
whilst the OLS direction is trivially an element of S, the direction itself does not 
provide a basis for S. 

Let X ~ N p (fx, S) and denote \i y = E{Y) and Y, yxx = E{{Y - fi y )(X - 
H){X — fi) T }. With the application of Stein's Lemma [21(, Li [15( showed that 
the average Hessian matrix of E(Y\X) is given as 

Hcc = S 1 Yj yxx Yu 1 (3) 

where the eigenvectors corresponding to nonzero eigenvalues of H x are elements 
of S. Li also noted that adding a linear function of B T X to Y does not change 
Ha; so that an alternative definition is 

Ha: = S l Yj rxx Ti 1 (4) 

where T, rxx = E{r(Y,X)(X - fi)(X - fi) T } and r(Y, X) is the OLS residual 
function. 

The original PHD methods estimated the matrix H z based on Z = S _1 / 2 (X — 
fi) which provides an orthonormal basis for Y?-/ 2 S. Re-transformation using 
S^ 1 / 2 could then be utilized to provide a basis for S. However, the eigenvectors 
based on non-zero eigenvalues of Ha, provide an orthonormal basis for S and, 
as such, all further reference throughout this paper to the PHD methods will be 
concerning estimation of H x . 
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3. Perturbation analysis in the dimension reduction setting 

Consider an arbitrary distribution function F and define the contamination 
distribution, with respect to F and contaminant point u>, to be F f = (f — e)-F + 
eA w where < e < 1 and is the Dirac measure putting all of its mass at 
w. Consider a statistical estimator with functional t defined at F and F e . The 
influence function for t at F is defined to be 



The influence function approximates the relative influence of an observation w 
from a large sample generated from F on the estimator t. 

Perturbation analysis in dimension reduction seeks to study the effect of 
small perturbations on detecting a correct basis for S. Let bk (k = 1, . . . , K) 
denote the functional for an e.d.r. direction estimator with, for an arbitrary 
distribution F, \\b k (F)\\ = 1 and bi(F) T bj(F) = (i ^ j). Also, let (Y,X) ~ G 
such that the model in fl} is satisfied and span{&i(G), . . . , &k(G)} = S such 
that 61(G), . . . , bic(G) provide a basis for S. 

In the dimension reduction setting define the contamination distribution func- 
tion as 



where < e < 1 and Ar y0i;Eo ) is the Dirac measure putting all of its mass at 
the point (y ,x ) £ R p+1 . Let S e — span{6i(G e ), . . . , 6/f(G c )} be the equal- 
dimension perturbed equivalent of S. 

Since the basis for S is of primary relevance, a perturbation analysis seeking 
changes in S e should not simply compare S and S e column by column. Following 
the lead of Benasseni one approach is to study the angle between each b k {G e ) 
and its projection onto S. In noting that many measures of angle are insensitive 
to small perturbations, Benasseni introduced a measure between spans that 
utilized the average sine of the angle between each element of one basis and its 
projection onto the space spanned by the other. Benasseni then also derived the 
influence function for this measure based on eigenvector subsets of the covariance 
matrix estimator. 

Prendergast utilized Benasseni's measure for a sensitivity analysis of SIR 
using the influence function. Prendergast [2(| extended this result to include the 
methods save and SIRII and provided useful sensitivity comparisons between 
these methods and SIR. For a given (y ,x ), the influence function for this 
measure is simply the negative average of the sine of the angle between each 
perturbed direction and its projection onto the unperturbed space relative to 
e i 0. Hence, the sine of this angle can be seen as a relative increase in sine 
due to an e-perturbation. We now provide a formal definition of the Relative 
Increase in Sine with respect to the fcth e.d.r. direction estimator. 

Definition 3.1. Using the notation defined above, let 9 et k denote the angle 
between bk(G e ) and its projection onto S. The Relative Increase in absolute 





G c = (l-e)G + eA {y0t3:o) 



(6) 



L.A. Prendergast and J. A. Smith/PHB sensitivity analysis 



257 



Sine (ris) for the kth direction is defined to be 



Ris(b k ,G;yo 7 x ) = lim 



sin(0 e ,fc) 



c 



at G. 



Remark 3.1. Let s denote the statistical functional such that, at an arbitrary dis- 
tribution F, s(F) = sin(#F) where Of is the angle between bk(F) and its projec- 
tion onto S. Then, with # e .fc defined as in Definition 13. 1[ and since sin(#o,fc) = 0, 
then 

Ris(b k ,G;y ,x ) = \lF(s,G;y ,x )\ . 

Remark 3.2. There is a strong link between the MS and the influence functions 
for SIR, SAVE and SIRII considered by [ljj [20( in that they are equal to 



under the appropriate conditions for which they were defined. 
Assume 6 e ^ E [— tv, tt]. The RIS has the following properties: 

i) When 9 ttk = ±ir or 6 e> k = then bk{G t ) G S and Ris(6fc, G; y , x ) = 0. 

ii) When Q ttk = ±tt/2 then b k (G e )±S and RIS(6fc, G; y a , x Q ) = oo. 
hi) When bk(G e ) is rotated away from S, Ris(bk,G;yo,Xo) increases, 
iv) When bk{G e ) is rotated towards S, Ris(bk,G-yo,x ) decreases. 

Closed- form solutions to Ris(6fc, G; yo, xq) can then be used to study the effect 
that various observational types have on the fcth e.d.r. direction estimator. This 
will be looked at with respect to PHD in the next section. 

4. Influence on the PHD e.d.r. space estimator 

Throughout this section assume G e and G are defined as in © with the 
following condition. 

Condition 4.1. For (Y, X) - G, X - N p (fJ,, E). 

Under Condition 14.11 let |Ai| > ... > |Ak| > denote the absolute nonzero 
eigenvalues of H x that correspond to the PHD e.d.r. directions 71, . . . ,Jk and 
let r = [71 , . . . , jk] ■ The proof of the following Theorem can be found in the 
Appendix (|A.HIA.3p . 

Theorem 4.1. With notation defined above, letb\ andb r k denote the junctionals 
for the kth y-based and r-based PHD e.d.r direction estimators such that, at G 
and under Condition \4-l\ °\{G) — b r k (G) = ~/k corresponds to the eigenvalue 
Afe. Then, where P$ = IT T , 



1 



A' 



A' 



'Y^Ris{b k ,G;y a ,x ) 



k=l 



Ris(b v k ,G;y ,x Q ) 
Ris(b r k ,G;y ,x ) 



IKJp-P^E-io^fcH/lAfcl, 
||(Jp-Ps)E-*a r ,fc||/|A fc | 
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with 

OL v ,k = {(yo- Mj,)7fe z - XkjJ^za - 7 A Txr 1 £ x , / } z 

c*r,fc = ^r G (y ,x )jJ'E^izo - A^jE^oj z - r G (y , jc )E - 37 fc 

where Zq = E _1 / 2 (xo — fi), = cov(X,Y) and rc(yoi x o) is the OLS residual 
for (yo,Xo) corresponding to the regression ofY on X at G. 

Remark 4.1. The RIS measures for the PHD y and PHD r methods are equal for 
any given (yo,Xo) when Ti xy = 0. This can occur when Y JLX (a trivial case 
that is not supported under the assumption of rank(H x ) > 0) or for some types 
of link function /. For example, let Z = \Z\, . . . , Z p ] T ~ N(0, I p ) and suppose 
Y = Z\ + e with e_U_Z and E(e) = then H zy = E(Y Z) = 0. 

We now consider some examples that allow us to study the sensitivity of the 
PHD methods. 

Example 4.1. Consider the multiple-index model with E{X) = and cov(X) = 
I p . Let (yo,Xo) = (yo,cu) where c £ R and u £ W, ||it|| = 1, U.LS. Then 
Ris(6£, G; yo, xq) = and ms(b v k , G; y , x ) = |cS^7 fc /A fe | for k = 1, . . . , K. 

This example is interesting for two reasons. Firstly, despite the fact that 
both PHDy and PHD r estimate the same matrix, the two methods can behave 
completely differently with respect to certain types of observations. Secondly, 
[3; [2(| showed that observations of this type can be highly influential for similar 
dimension reduction methods such as SIR, SAVE and SIRII. However, this is not 
the case with PHD r so that, with respect to observations of this type, PHD r is 
unusual. 

Example 4.2. Consider the single-index model 

Y = cos(2/3jX - tt/4) + as 

where X ~ N p (0,I p ), e ~ N(0, 1) and ||/3i|| = 1. For this model 71 = ±/3i and 
we take, without loss of generality, 71 — (3\. Here, the choice of a is irrelevant 
since fi y = E(Y) = E[cos(2f3j X ~ tt/4)], H x = E[(Y - fj, y )XX T ] and T, xy = 
cov(X,Y) = cov[X, cos(2/3 ] r X - tt/4)] due to E(e) = and e_U_X. For this 
model we have 

1 2 

fly = —j^r" 2 , T, xy = v2e -2 /3i, Ai = — — j=e~ 2 

where, for verification, technical details can be found in the Appendix (| A.4|) . 

Note that, since ||/3i|j = 1 then [3j x n = \\xo\\ cos(0 o ) where 9o is the angle 
between Xq and j3\. Hence, from Theorem 14.11 we have Ris(6^, G; yo, Xq) = 
c y \\x \\ — cos 2 (^o ) and Ris(&i, G; yo, xo) = c r ||a;o|| yl — cos 2 ($o) where 

C V = I [(yo - Hy)\\xo\\ COS(6» ) - Ai||a=o|| COs(6> ) - Pj^xy] / Ai I , 

c r = \{[Vo-(hi- Pi'ZxyWxoW cos(9 )]\\x \\ cos(6» ) - Ai ||asoll cos(6» )} /Ai| . 
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For plots of Ris(6^, G; y , x ) and Ris(&^, G; y , x ) we set y — cos(2/3 ] r a;o — 
7r/4) such that yol^o is consistent with the model without error. This allows us 
to study the sensitivity of the methods with respect to typical observations. 

In Figure Q] (a) we plot Kis(b\, G; yo, Xq) for varying cos(#o) and ||xo||. It is 
clear from this plot that just small changes in #o can result in large changes 
of influence; in particular with increasing ||a?o||- It is also clear, however, that 
outliers in the predictor space, in the sense of a large ||a;o||> are not necessarily 
highly influential on the e.d.r. space estimator. In fact, it is possible for outlying 
observations to have little or no influence. In plot (b) we provide a simple cross- 
section of Ris(b\, G; yo, Xq) where ||a;o|| = 2. This plot emphasizes the large 
differences in influence that can be obtained with only small rotations of x$. 

Similarly, in Figure Q] (c) we plot ris(6J, G; yo, ^o) for varying cos(8 ) and 
|| xq ||- Again it is evident that small rotations of Xo can effect large changes in 
influence on the r-based e.d.r. space estimator. This is again emphasized via 
a cross-section where ||a;o|| = 2 in plot (d). For the range of cos(#o) and H^oll 
values provided here, the highest influence was achieved for the r-bascd method. 
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However, for some types of observations it is clear that this method is less sen- 
sitive than the y-based approach. As mentioned in Example 14.11 there is zero 
influence on the r-based e.d.r. space estimator when a;o-LiS. This is again em- 
phasized in plot (d) whereas the same observational type has non-zero influence 
on the y-based method. 



5. Sample based sensitivity 



Before we look at sample versions of the RIS we review sample versions of the 
influence function in general (see, for e.g., Q). Consider a sample of m observa- 
tions, u>i, . . . , w m , sampled from F and let F n denote the empirical distribution 
of this sample. Also, let F n tfi denote the empirical distribution for the sample 
without the jth observation. Recall the definition of the influence function for 
a statistical functional t given in ([5]). The sample influence function (SIF) for 
the jth observation on the estimator t is achieved by replacing F e with F n and 
F with F n> (j) such that SIF(t, F n ; Wj) = {n — \){t(F n ) — t(F n ^)}. An approxi- 
mating empirical version of the SIF can be achieved by replacing F with F n in 
a closed-form derivation of the influence function. This approximating version 
is often referred to as the empirical influence function (EIF) and depends only 
on estimates at F n and the observation wa. 



5. 1 . Sample versions of the RIS 



Due to the link between the RIS and the influence function (see Remark 13. 1|) 
sample versions based on the SIF and EIF of the RIS will now be introduced to 
detect influential observations in practice. Let {(j/i, CEj) : i = 1, . . . , n} denote a 
sample of n observations with sample mean and covariance of the a;,-'s given as 
x and covariance S, and sample mean of the y^s given as y. For this sample, 
let G n denote the empirical distribution and let G n u\ denote the empirical 

distribution with the jth observation removed. Also, let T y = [% y i, ■ ■ ■ ,%,k] 
denote the estimated basis for S at G n for y-based PHD and similarly denote 
T r = [7r,i , ■ ■ ■ , 1fr,K] for r-based with P y = T y Fy and P r = F r Fj . Also suppose 
that ~/y t k and 7^ are associated with the eigenvalues A^fc and \ Tt k respectively. 

Let 6 v k - denote the angle between the fcth y-based estimated e.d.r. direction 
at G n jj\ (i.e. without the jth observation) and its projection with respect to 

P y onto the space spanned by the columns of T y . Then the sample RIS for the 
jth observation is 

8WSy,k(3Jj,X j ) = (n - 1) sin 
and similarly, we define 

SRls r ,k(yj,Xj) = (n - 1) I sin (8 r kj ) 



for the r-based approach. 



L.A. Prendergast and J. A. Smith/PHB sensitivity analysis 



261 



Two issues arise with the use of the SRIS. The hrst is that, whilst it may be 
employed to detect influential observations, the measure provides little interpre- 
tive information as to why an observation may or may not be influential. The 
second issue is that the e.d.r. space needs to be estimated n + 1 times; once 



each at G n , G n 



(n) 



An alternative is to approximate the SRIS by re- 



placing G with G n in the RIS to obtain a version that replaces the unknown 
parameters with their respective estimates at G n . We will let these y and r- 
based phd empirical measures be denoted as ERiSj )i fc(y J -, Xj) and EKLS Tt k[yj,Xj) 
respectively. 

The empirical approximations to the sample influence measures may not offer 
a reasonable approximation to the sample measures when n is small [20] . Pren- 
dergast [20I ] then introduced a hybrid measure that utilized both the empirical 
and sample measures which improved the approximation whilst retaining the ef- 
ficiency and interpretative strengths of the empirical measure. For example, from 
the Appendix, we have Ris(b v k ,G;y ,x ) = \\(I p - P s )lF(H y , G; y , a; )7fc/A fe || 
where IF(Hj,, G; yo, Xq) is the influence function for the y-based PHD average 
Hessian matrix estimator. Hence the empirical RIS is, EKiS Vy k(yj,Xj) = \\(I P — 
P„)EIF(H„,G n\ Vj) x j)^y,k/^k\\ where FAF(H y , G„; yj, Xj) is the empirical in- 
fluence function for H y at G„. The idea of the hybrid measure is to replace 
the EIF(H y , G„; yj, Xj) with an efficiently computed SIF(H y , G„; yj, Xj) = (n — 
1){Hj,(G„) — H y (G„.(j))} which is derived in a closed form in terms of (yj,Xj) 
and the estimates at G n . 

Let 'Syxx denote the maximum likelihood estimate of ^ yxx at G n and let S 
denote the usual unbiased estimator of £ at G„. Similarly, let these estimates 
at G n u\ be denoted H yxx ^ and Su\ respectively. Then, for S xy denoting the 
usual unbiased estimate at G n for the covariance between the a;,'s and yi's, it 
can be shown that 



1 



J yxx,(j) 



n — 1 



S xy (xj — x) T + (xj x)S], 



J y XX ~ ^ajyv J 



, n(n + 1) , _, . _, T 

+ [yj -y)\ I p- ( n _ 1)2 [ Xi - x >[ Xi ~ x > 



(7) 



which provides a closed-form solution for T, yX3 .uy This along with the fact that 
(see, for example, [io| ) 



! _ (n-2) 1/2 



where zh 



(n-1) 



(n — l) 5 



— z ■ Z,i 

3 3 



ZjzJ 



g-1/2 



x), allows us to derive a closed form solution for 



the SlF(H y ,G n ;yj,Xj). We will denote the hybrid measure that replaces the 
EIF(H a , G„; yj, Xj) with this closed form solution for SIF(H a , G„; yj, Xj) in the 
BBIS w ,fc(yj, Xj) as HRiS Vi k(yj,Xj). Similarly we can define a version for the r- 
based approach and denote this as HRIS r) fc(j/j, Xj). 
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For comparative purposes we will also consider the Mahalanobis Distance 
(md) as a measure of outlyingness for observations in the predictor space. For 
the ith observation this is given as 



MD(iCi) = y(xi- x) T S 1 (x i - x). 

We now consider an example that looks at the usefulness of these influence 
measures in practice. 



5.2. Hitter's data example 

The Hitter's data set, first published in Sports Illustrated (April 20, 1987), con- 
tains seventeen quantitative variables concerning regular and leading substitute 
hitters competing in American major league baseball in 1986. The response 
is the log of the salary variable where any individuals whose salary was not 
recorded were omitted leaving a total of n = 263 observations. [17J also applied 
PHD to this data. The three largest absolute eigenvalues for PHD y are 0.0314, 
0.0238, and 0.0060 and, as such, we choose K = 2. 



(a) Average y-based pHd influence (b) Average r-based pHd influence 




50 100 150 200 250 50 100 150 200 250 



Fig 2. Plots of (a) average SRIS, ERIS and HRIS for first two PHD y directions (b) average SRIS, 
ERIS and HRIS for first two PHD r directions, (c) average SRIS for first two PHD M and PHD r 
directions (d) MD values for the Hitter's data where i indexes the ith smallest average SRIS 
for the first two PHDy directions. 



In Figure [2] we provide plots of sample versions of MS for PHDj, and PHD r and 
the MD values for the Hitter's data. For clarity, all data in the plots are ordered 
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according to the size of the PHDj, Ris values such that i indexes the ith smallest 
average of the MS values for the 1st and 2nd PHD,, directions. 

Plot (a) shows that the ERIS provides a good approximation to the SRIS and 
can be used to successfully detect influential observations for this example with 
respect to PHDj, however it tends to underestimate the SRIS. On the other hand 
the HRIS in general, gives an improved approximation for this data. Plot (b) 
indicates similar findings for PHD r though the ordering according to the PHDj, 
values makes it difficult to draw direct comparisons. This will be left to the 
discussion of Table [TJ 

In Plot (c) we provide direct comparisons between the SRIS for PHDj, and 
PHD r . We see that the magnitude of influence can be significantly greater for 
PHD r with the largest average SRIS for PHD r being more than three-fold the 
largest calculated for PHD y . Conversely, however, it is also clear from this plot 
for some observations that are highly influential on the PHD y estimator, little 
influence is recorded for the PHD r estimator. This plot further emphasizes the 
difference in the methods with regards to sensitivity. 

In Plot (d) we provide the MD values for the data. Here it is evident that there 
is little tendency for outliers to be influential and vice versa when compared to 
the influence values recorded for PHDj, . We leave comparisons of the MD values 
with the influence on the PHD r estimator to the discussion of Table [TJ 

Table 1 

Spearman Rank Correlations of SRIS versus the ERIS, HRIS and MD for the Hitter's Data. 
Results are for the 1st estimated direction, 2nd estimated direction, and the average 
influence for these two directions. 





1st Direction 


2nd Direction 


Average Direction 




ERIS 


HRIS 


MD 


ERIS 


HRIS 


MD 


ERIS 


HRIS 


MD 


PHDj, 


0.898 


0.996 


0.435 


0.922 


0.992 


0.388 


0.935 


0.995 


0.506 


PHD r 


0.912 


0.999 


0.388 


0.776 


0.946 


0.544 


0.821 


0.952 


0.564 



In Table [TJ we provide further comparisons between the sample versions of 
the RIS for PHDj, and PHD r using Spearman Rank Correlations. 

For this example we see that the SRIS for each of the PHD y directions is 
approximated well by the respective eris values. With respect to PHD r , the 
ERIS approximates the SRIS very well for the first direction and moderately well 
with respect to the second direction. The hris approximates the SRIS extremely 
well for each direction estimated using either method. 

The low correlations between the SRIS and MD values emphasize that not all 
outliers are influential and vice versa, therefore treating them may not necessar- 
ily benefit the estimates. As such, troublesome observations from an influence 
perspective, may lurk within otherwise typical observations. 



6. Conclusion 

We have introduced and considered an influence measure (ris) based on 
the influence function and Benasseni's coefficient to compare two versions of 
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Principal Hessian Directions (PHD y and phd,.). Despite the fact that PHD y and 
PHD r seek to estimate the same Hessian matrix (and hence a basis for S) under 
assumed normality of the predictor variable, we have shown that these methods 
can behave differently in the presence of certain observational types. 

Since these differences exist in favor of either PHD y or PHD r depending on 
the observational types considered, we recommend the implementation of both 
approaches in practice and for users to give consideration to both analyses. 

The unboundedness of the influence measure for both methods also reiterates 
the findings for other dimension reduction methods by 2; 11; 18; Ijl 2(| which 
show that such methods can fail in the presence of certain types of observations. 
As such, considerations for the robustification of PHD y and PHD r should be 
initialized. 

We also provided details for how a measure such as the SRIS can be utilized at 
the sample level to detect influential observations in practice. Two sample mea- 
sures, the ems and hris, were considered as efficient approximations to the true 
sample influence. The eris tended to underestimate the influence, in particular 
for small samples, though was typically successful at detecting influential ob- 
servations for the example considered. For this example it is important to note 
that the hris provided an excellent approximation to the sample influence. 



Appendix A: Technical details 

A . 1 . Preliminaries 

For simplicity throughout, when necessary let {. . .} T denote the transpose 
of the preceding term enclosed in {}. Let T y and T denote the functional for 
the usual mean estimators of Y and X respectively where T V (G) = n y and 
T(G) = fj,. Also, let G denote the function for the usual covariance matrix 
estimator where C(G) — £ and recall that covg(Y, X) = Y, yx with S xy = SL,. 

A. 2. RIS proof for y-based PHD of Theorem \J7l\ 

Let C yxx denote the functional defined to be, at an arbitrary distribution 
(Y,X) ~ F for which it exists, C yxx (F) = J{Y - T y {F)}{X - T{F)}{X - 
T(F)} T dF. At G e , 

C yxx (G e ) = J{Y~ T y (G e )}{X - T(G e )}{X - T(e)} T dG t 

= (1 - e)T, yxx + e(y a - /%){(x - n)(x - A*) T - £} 

- e(x Q - n)T, yx - e£. xv {x - /x) T + 0(e 2 ). (8) 

Let H y denote the functional for the PHD y matrix estimator where H !/ (G) = 
H x and H y (G £ ) = {G(G £ )}- 1 C, xa; (G e ){C(G £ )}- 1 . From 0], IF(G, G; y , x ) = 
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(jc ~ fJ>)(xo - V) T - S. Since {C{G e )}- l C{G € ) = I p , by way of the Product 
Rule we have that [d {C(G e )}~7&:]|e=o53 + SIF(C, G; j/ , cc ) = so that 

[d {G(G £ )}-79e]| £=0 = -S- x {(^o - m)(*o - A*) T - E}E _1 . (9) 
Therefore, using the Product Rule, © and |J9]), 

IF(H„, G;y ,a;o) =R X - [ST^a* - m) {(^o - m) T H* + E^E" 1 }] - [. . .] T 
+ (y - fiy)!:- 1 ^ - ti){x - /x) T - SjS- 1 . (10) 

Let b\ (k = 1, . . . , if) denote the functional for the fcth PHD y e.d.r. direction 
estimator where b v k {G) = -jk and let Q\ e denote the angle between b k (G e ) and 

P s b y k (G t ). By utilizing the identity sin(0) = yj\ - cos 2 (6»), |sin(^J| = ||(I P - 
P 5 ){6|(G £ ) - 7fc }|| since (I - P s )j k = 0. Therefore 



RJS(&?, G; y , »o) = 1™ | sin(6»^ 

ej.0 



|(I p -Ps)IF(^,G;yo,*o) 



where IF(6^, G; y ? ^o) is the influence function at G for the estimator with 
functional b k . 

Results from [8; 9] may be used to show that the influence function for at G 
for b\ is (see [3) 



IF(b y k ,G;y ,x ) = 



K 

E 

3=1 



Afc - A 



Afc 



IF(Hj / ,G;?/ ,a;o)7fe- 



The proof is complete by noting that, from ©, (I p — P^)S ^^j, = 0, 
(I p - P 5 )7fc = for k = 1, . • . , K and (I p - P s ) 2 = (I p - P s ). 



A. 3. RIS proof for r-based PHD of Theorem \J7l\ 

The same conditions and definitions as those given for the lris proof for 
PHDy are likewise employed here. Let C rxx be the functional defined at an 
arbitrary F to be C rxx (F) = J r F {Y,X){X - T{F)}{X - T(F)} T dF where 
rpiY.X) denotes the OLS residual function for the regression of Y on X where 
(Y, X) ~ F and denote C rxx {F) = T> rxx . The OLS residual functional is of the 
form r F (Y, X) = Y — T y (F) - {X - T(F)} T {G(F)}- 1 G a . 9 (F) so that, at G e , 

C rxx (G e ) = C yxx (G € ) -J[{X- T(G)} T {C{G)}- l C xy (G)] {X - T(GJ 

x{X ~T{G £ } T dG e . 

Then, from ([5]) and since Tj rxx — ^ yxx when X ~ N p (fi, E), 

C rxx {G £ ) =(1 - e)T, rxx + e r G (y ,x ){(x - n)(x - A*) T - E}. (11) 

From (fTTj) . the remainder of the proof can be completed by closely following 
the proof for the PBD y RIS. 
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A. 4- Expectation results for Example \4-2\ 

Firstly, recall the power series for e x given as 



to! ^ fm- 1)!' < ' 12 ' ) 



* — ' TO. 

m— m—1 ' 

Throughout let Z = flj X where Z ~ iV(0, 1) since ||/3i|| = 1. The Taylor 
series expansion of cos(2Z — 7f/4) around Z = n/8 gives 

cos(2Z~ 7 r/4)^^(-ir— -(Z--) . (13) 

Using the moment generating function (mgf), E[(Z — ir/4) 2n ] = (2n)!/(2 n n!) 
for n e N so that, from fT2& and 03J>, £(Y) = £[cos(2Z- tt/4)] = exp(-2)/V2. 

Since cov(X, 7) €5 then cov(X, V) = c/3i for some c£l. Hence, cov(Z, V) 
= (3jcov(X , Y) so that c = cov(Z, Y). Using a Taylor Series expansion of 
gi(Z) = Z cos(2Z — 7r/4) around Z = 0, we have 

since, again via the mgf, E[Z 2n+1 ] = and E[Z 2n ] = (2n)!/(2 n n!) for n E N. We 
also have ^ 2n) (0) = -n2 2,l (-l)"/V2 so that, from ([12]) and dHJ), cov(Z, F) = 
V2exp(-2). _ 

Note that Ai = PjR x Pi = E[(Y - [i y )Z 2 \ where, for g 2 {Z) = Z 2 cos(2Z - 
7r/4), the Taylor Series Expansion around Z = for E{Y Z 2 ) is identical to 
that of HUD with gf n) {Q) replacing .gf n) (0). We have <^ 2n) (0) = -n(2n - 
l)2 2n - 1 (-l)"/V2 so that, from (JUJ) and since £(F) = exp(-2)/V2, E[(Y - 
[iy)Z 2 \ = -2V2exp(-2). 
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